3 research outputs found
Recommended from our members
Containerization on Petascale HPC Clusters
Containerization technologies provide a mechanism to encapsulate applications and many of their dependencies, facilitating software portability and reproducibility on HPC systems. However, in order to access many of the architectural features that enable HPC system performance, compatibility between certain components of the container and host are required, resulting in a trade-off between portability and performance. In this work, we discuss our early experiences running three state-of-the-art containerization technologies on the petascale Frontera system. We present how we build the containers to ensure performance and security and their performance at scale.We ran microbenchmarks at a scale of 4,096 nodes and demonstrate the near-native performance and minimal memory overheads by the containerized environments at 70,000 processes on 1,296 nodes with a scientific application MILC - a quantum chromodynamics code.UT Austin-Portugal Program, a collaboration between the Portuguese Foundation of Science and Technology and the University of Texas at Austin, award UTA18-001217Texas Advanced Computing Center (TACC
PADLL: Taming Metadata-intensive HPC Jobs Through Dynamic, Application-agnostic QoS Control
Modern I/O applications that run on HPC infrastructures are increasingly
becoming read and metadata intensive. However, having multiple concurrent
applications submitting large amounts of metadata operations can easily
saturate the shared parallel file system's metadata resources, leading to
overall performance degradation and I/O unfairness. We present PADLL, an
application and file system agnostic storage middleware that enables QoS
control of data and metadata workflows in HPC storage systems. It adopts ideas
from Software-Defined Storage, building data plane stages that mediate and rate
limit POSIX requests submitted to the shared file system, and a control plane
that holistically coordinates how all I/O workflows are handled. We demonstrate
its performance and feasibility under multiple QoS policies using synthetic
benchmarks, real-world applications, and traces collected from a production
file system. Results show that PADLL can enforce complex storage QoS policies
over concurrent metadata-aggressive jobs, ensuring fairness and prioritization.Comment: To appear at 23rd IEEE/ACM International Symposium on Cluster, Cloud
and Internet Computing (CCGrid'23